Search Results: "sonne"

27 October 2012

Soeren Sonnenburg: Shogun at Google Summer of Code 2012

The summer came finally to an end and (yes in Berlin we still had 20 C end of October), unfortunately, so did GSoC with it. This has been the second time for SHOGUN to be in GSoC. For those unfamiliar with SHOGUN - it is a very versatile machine learning toolbox that enables unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, or explorative data analysis. I again played the role of an org admin and co-mentor this year and would like to take the opportunity to summarize enhancements to the toolbox and my GSoC experience: In contrast to last year, we required code-contributions in the application phase of GSoC already, i.e., a (small) patch was mandatory for your application to be considered. This reduced the number of applications we received: 48 proposals from 38 students instead of 70 proposals from about 60 students last year but also increased the overall quality of the applications. In the end we were very happy to get 8 very talented students and have the opportunity of boosting the project thanks to their hard and awesome work. Thanks to google for sponsoring three more students compared to last GSoC. Still we gave one slot back to the pool for good to the octave project (They used it very wisely and octave will have a just-in-time compiler now, which will benefit us all!). SHOGUN 2.0.0 is the new release of the toolbox including of course all the new features that the students have implemented in their projects. On the one hand, modules that were already in SHOGUN have been extended or improved. For example, Jacob Walker has implemented Gaussian Processes (GPs) improving the usability of SHOGUN for regression problems. A framework for multiclass learning by Chiyuan Zhang including state-of-the-art methods in this area such as Error-Correcting Output Coding (ECOC) and ShareBoost, among others. In addition, Evgeniy Andreev has made very important improvements w.r.t. the accessibility of SHOGUN. Thanks to his work with SWIG director classes, now it is possible to use python for prototyping and make use of that code with the same flexibility as if it had been written in the C++ core of the project. On the other hand, completely new frameworks and other functionalities have been added to the project as well. This is the case of multitask learning and domain adaptation algorithms written by Sergey Lisitsyn and the kernel two-sample or dependence test by Heiko Strathmann. Viktor Gal has introduced latent SVMs to SHOGUN and, finally, two students have worked in the new structured output learning framework. Fernando Iglesias made the design of this framework introducing the structured output machines into SHOGUN while Michal Uricar has implemented several bundle methods to solve the optimization problem of the structured output SVM. It has been very fun and interesting how the work done in different projects has been put together very early, even during the GSoC period. Only to show an example of this dealing with the generic structured output framework and the improvements in the accessibility. It is possible to make use of the SWIG directors to implement the application specific mechanisms of a structured learning problem instance in python and then use the rest of the framework (written in C++) to solve this new problem. Students! You all did a great job and I am more than amazed what you all have achieved. Thank you very much and I hope some of you will stick around. Besides all these improvements it has been particularly challenging for me as org admin to scale the project. While I could still be deeply involved in each and every part of the project last GSoC, this was no longer possible this year. Learning to trust that your mentors are doing the job is something that didn't come easy to me. Having had about monthly all-hands meetings did help and so did monitoring the happiness of the students. I am glad that it all worked out nicely this year too. Again, I would like to mention that SHOGUN improved a lot code-base/code-quality wise. Students gave very constructive feedback about our (lack) of proper Vector/Matrix/String/Sparse Matrix types. We now have all these implemented doing automagic memory garbage collection behind scenes. We have started to transition to use Eigen3 as our matrix library of choice, which made quite a number of algorithms much easier to implement. We generalized the Label framework (CLabels) to be tractable for not just classification and regression but multitask and structured output learning. Finally, we have had quite a number of infrastructure improvements. Thanks to GSoC money we have a dedicated server for running the buildbot/buildslaves and website. The ML Group at TU Berlin does sponsor virtual machines for building SHOGUN on Debian and Cygwin. Viktor Gal stepped up providing buildslaves for Ubuntu and FreeBSD. Gunnar Raetschs group is supporting redhat based build tests. We have Travis CI running testing pull requests for breakage even before merges. Code quality is now monitored utilizing LLVMs scan-build. Bernard Hernandez appeared and wrote a fancy new website for SHOGUN. A more detailed description of the achievements of each of the students follows:

4 September 2012

Gunnar Wolf: Electronic voting in Panama: Slower, more expensive, more uncertainty... Goodbye!

Panama just underwent a nasty e-voting exercise: Electronic-mediated elections were held for the committee of the PRD party. It sounds simple - Even trivial! There were only 4100 authorized voters, it was geographically trivial (all set inside a stadium)... But it blew up in smoke. I won't reiterate all what happened, I'll rather direct you to our project's (the e-voting observatorium) page: News regarding Panama (for those coming from the future, search starting at 2012-08-27 and yes, it's all in Spanish, but there are free-as-in-beer translation services. Many e-vote proponents/sellers/pushers were very eagerly waiting for this election to brag about one more success... So much that they could not just ignore it, and started rationalizing it away. Anyway, while feeding the observatorium, I came across this opinion-article in the Voto Digital website, which makes quite a bit of pro-e-voting noise. I replied to it, and I think my analysis is worth sharing also with you:
So, lets make some simple numbers, rounding the numbers: The PRD vote in Panama was done for a universe of 4100 voters. It took 10 hours (instead of the planned 4), so 410 people were processed every hour. There were 40 voting (electronic) booths, so each processed 10 people per hour. This means, each person spent 6 minutes by the booth. A manual vote in this fashion is highly parallelizable: Each of the voters can be given ballots with anticipation, or many of them cna be allowed in to be given the ballots in situ (depending on the electoral scheme employed). The contention time is the time it takes to each voter to get near the booth and deposit his ballot (either folded or in an envelope) - And it will very rarely be more than a couple of seconds. So, given that using electronic booths parallelism cannot grow (there is a fixed number of machines) and the queues grew wildly, with traditional voting it would have surely fitted in the expected four hours (they were expected, also, based on their past experiences). As for counting, that's the slowest part of manual voting, it's also highly parallelizable: If each of the 40 booths has slightly over 100 ballots, the party personnel can easily count them in under 30 minutes. Capture and aggregation for the 40 partial results would take an extra 10 minutes, even being generous. Manual voting would have saved them around five hours, without demanding additional resources (and being thus much more economical than having to buy 40 specific-purpose computers). And as an additional advantage, the physical and tangible vote proofs would remain, in case they were ever again needed.

19 May 2012

Richard Hartmann: Motherland's bosom

I read a translated poem about Russia being "the Motherland" and its vast bosom years ago. Having driven through a significant part of it, I can agree on the "vast" part... Also, as I am on a train and without access to the Internet, I will refrain from linking to a lot of pages; sorry. (Turns out I am posting this a week later, but I will still not link to stuff now; no time). Russia in general Moscow Sights Kreml Our remaining time in Moscow was spent with touring the usual suspects; the Kreml is a lot less impressive in real life, the Red Square is tiny when compared to the stories I heard about it and the Chapel ofi St. Basil is even more colorful and impressive in real life. Lenin's body was inaccessible because workers built seats for the May 9th parade to the left and the right of it and they apparently thought it would be a good idea to block access to one of the main tourist attractions while doing so. A river tour of Moscow was a nice cool-off and we got to see quite a few things. We managed to see the weekly military parade within the Kreml grounds, but it was mostly pomp and little substance. The National Treasure which you can access with an extra ticket within the Kreml grounds is nice, but less impressive than the tourist guides would make you believe. That being said... There's another museum within the museum and.... Whoah... Tourists pay extra, visitors go through the only non-security-theater check I encountered in Russia, guards are armed, people can only enter and leave in batches, and the stuff which is presented is mind-boggling. Disregarding the fist-to-calf-sized chunks of gold and platinum which are still in their original form directly from the mine, there is real, actual treasure galore. Little heaps of uncut and cut diamonds, an outline of Russia filled with cut diamonds and other random "we have this stuff" displays can be found as well. Then, you have various tiaras and other jewellery made from various gems. Not incorporating, but largely made of. All that pales in comparison to the crown, royal apple, scepter, etc. It's hard to put the amount of tiny multi-colored light points that shine at you into words. I was just standing there, swaying back and forth to catch the moving pattern of pinpoints. It's said that this collection is equalled only by the ones in the Tower of London and the one Shaw of Iran had and boy do I believe it. TV Tower Getting up there was funny. The old-style Soviet queuing system was used: "Security" for approaching the tower was multi-level, the guards see you approach along a long walkway way in advance and the main guard shed had several small cabins separated by thick glass. So good so menacing. But in a twist that would make Bizarro and Garry Larson proud, I was required, by means of metal detector gate, metal detector wand and even an x-ray machine to remove every shred of metal and other hard objects from myself and the camera bag and put them onto a table. Once I was without anything except my clothes and the bag was completely empty, I could pass. Everything I had had to remove was just laying there, not inspected in the least, for me to stuff back into pockets and bag and to take with me. This "everything" included a Spot Messenger 2 with lots of green and red blinky lights. The guard did not even glance and it. Security theater? Security theater. The view from 364 meters down on Moscow was nice, but there was a lot of Smog so I couldn't see very far. Jumping on the glass floor while looking down was a lot of fun, though. Subway to Thiefing I bet Christopher Nolan rode the subway in Moscow at least once. That unnerving sound you hear during several key scenes in "The Dark Knight"? Two thirds of all subways make the same sound while moving. Also, I had an encounter with a pickpocket down there; very classical, too. Guy approaches quickly, talks loudly and sounds as if it's really important (in Russian... duh... that's sure to keep me interested). His approach made me turn and protect my left leg pocket automatically, most likely marking the target for the tiny woman standing behind me. Now, I have to tell you something about my usual travel layout. As my normal pockets are very deep, it looks as if their content was in the leg pocket. Plus, there's an extra, hidden leg pocket where I keep the passports and train tickets. The outermost leg pocket is protected by a velcro flap, but it contains nothing of value; usually the appropriate phrasebook, local map, maybe a tissue or chewing gum. Due to this layering, the outermost pocket looks as if it's full to the brim with stuff. Also, I took pains to make it a habit to protect said leg pocket with my hand, nothing else. This looks as if that's the target, but what I am actually doing is protect my normal pocket with my forearm. The right side is different, but the most easily accessibly pocket always holds some small change. I pay from that stash but my actual wallet is well out of reach. Anyway, once the guy ran off, talking to several others, most likely marking all them for the actual pickpockets, I wanted to enter the subway. While the Russian-style queuing took place, I felt an unusual tug at the velcro flap. I looked down and saw a tiny woman to the left of me with a jacket held over her right side with the left arm; I look up to check no one is trying to steal from my permanently assigned female, feel another tug, look the woman into the eyes, look up again and around me, look down again and she is gone. All that took maybe three seconds and I had boarded the subway after an additional two. In hindsight, it makes sense to choose the time of entry for attack. It's crowded, you are being pushed around, and once you are in the subway, it will start moving more or less immediately while the thief remains in the station. In this case, she would only have gotten a grubby map of Moscow's subway and an English-Russian phrasebook, but she got nothing at all. Moscow-Novosibirsk Where to begin... If you think a few hours on a train are a long time, try over fifty hours. Things get so bad, you start getting land-sick while not in a moving train. You even start missing the familiar tunk-cachunk, tunk-cachunk, tunk-cachunk... of driving over rails with gaps in them when you are not moving. The defining element of the Trans-Siberian Railway are birch trees. And birch trees. And then more birch trees. You would not believe how many birch trees there are. This is made "worse" by the way the Russian Railway protects their rails. Left and right of the track, there's a cleared area of maybe ten to twenty meters, sometimes as little as three. Outside of that, they plant ten to twenty meters of birch trees, presumably to catch snow during winter. Beyond that protective perimeter, there's the normal landscape.As a result, on top of the near endless stretches of birch woods, you see most if not all scenery through a layer of birch trees. You get sick sick of birch trees after a few hours and you see them for days on end. Bullet points to save myself some typing and you some reading... Novosibirsk The non-existent hostel We arrived at ~0200 local and made our way to the hostel we had booked a room with. Walking to the correct address, we saw several signs but they all turned out to be for a police station and some other state agency. We walked back, forth, double-checked, triple-checked: no hostel. We then walked around the building through some not-quite-nice back alleys, but other than a few entries to private flats, there was nothing. Thankfully, the booking slip included a number which we called and after at least twenty rings (no kidding), when I had given up and wanted to hang up, it stopped ringing. Dead silence. After maybe ten seconds, someone started talking in Russian. I asked him if he spoke English and told him that we could not find the hostel. He mumbled something about being sorry and that we should wait, he would come down. Fast forward a minute or two and someone walked towards us. Again, he mumbled about being sorry, that the hostel "did not work" at the moment and that we would need to sleep in his private apartment. He ushered us into some back alley entrance, into his flat, and proceeded to remove the sheets from the couch on which he had slept; after putting on new sheets, we had our "hostel" bed, ready to sleep on. We briefly considered if he would murder us in our sleep, but him and me even got to talking a bit. Over cheese, sausage and rum (at 0300), he admitted that the hostel did not exist and he merely planned to turn his flat into a hostel for the summer while he and his family moved into their summer house (the Russian term of which escapes me, at the moment) in the countryside. He had accepted our reservation as he thought he would be finished by that time. He did not even get started, though. While he sent us an overbooking notice through booking.com two days before, we were on the train at that time, so... booking.com even called him to check what happenend to us as we did not book another place through them. Good customer service/protection, that. Next morning, he didn't even want to take our money (we paid anyway) and, as a means of compensation, drove us into the city in the morning and to a train museum well outside the city limits, one of the fabled scientist cities, and a large lake which everyone in Novosibirsk claims is an ocean, in the afternoon. Foreigners, foreigners! All in all, Novosibirsk was relatively uneventful, safe for one bizarre episode. We took our lunch in a local fast food joint (why do all the good stories happen there, and not at the various truly local places?) and threw the cashier our well-rehearsed "Niet Russkie; anglisky?" with phrasebook in hand and he actually understood a few words of English (beef, chicken, fries). We told him, in our worst Russian, that we are from Germany wished him a nice day and went to sit down. A few minutes later, a girl approached us, literally hopping from one foot to the other and wringing her hands. She told us that the cashier had told her that we spoke English and if it would be OK if she talked to us. We suspected some sort of elaborate ruse, but went with it. Turns out, she had English at school and really wanted someone to practice English on. Two young men passed our table and exchanged a few words with her, sitting down out of sight. When she told us that she had to leave now but if it would be OK if the two boys joined us we suspected a ruse yet again. But those two were law students, one with a minor in English and one with a minor in German; both of them also extremely nervous, asking us if we would talk to them. When they had to leave, they told us that the three of them worked at the burger joint and that their shift was just about to start when the news that foreigners were here spread amongst staff like wildfire. The girl stopped by several times in between cleaning tables, getting in a sentence or two before being cussed at by her supervisor. All in all, this took about twenty minutes and seeing three people so nervous and grateful to talk with us felt beyond absurd. On the other hand, not a single traveller we met even considered stopping in Novosibirsk during their transit so there really does seem to be a shortage of non-Russians there. Weird, and memorable. Novosibirsk-Irkutsk Irkutsk / Listvianka / Lake Baikal Listvianka Aah, lake Baikal... the oldest and deepest lake on Earth which holds a fifth of the global non-salt water reserves; a must-see in my book. Quad tours at break-neck speeds, dry-suit diving with Russian regulators, walking barefoot in between and across drift ice that made its way onto the shorei, and extended hiking around the lake's coast... All of which I could not do because I was ill and had to spend two solid days in bed. The draft from the open window in between Novosibirsk and Irkutsk was enough to give me a rather bad cold which peaked at Lake Baikal. Still, the area was lovely and we were glad to be out of a train and able to unpack our stuff without having to repack immediately for once. I am not sure where my current losing streak with regards to diving is coming from (Grimsey, diving north of the Arctic circle with birds that plummet into the water and hunt fish: Only guy who does this is on the Icelandic mainland that day; Svalbard, diving north of the Arctic circle in permanent darkness: The few people who do this privately did not reply while I was there; Baikal, oldest, deepest, largest lake on Earth: ill), but I will most likely return to Russia for a week of ice diving in Lake Baikal next winter or the one after that. As an aside, I saw several people walking to Lake Baikal with buckets to get their water. Other people got it from a well which was still half frozen. If you have running water consider yourself lucky... Irkutsk Nice city, largely uneventful. The farther east you get within Russia, the more normal women look. In Moscow, just as in Paris, they are way over-dressed and even service personnel will walk with high heels. Thankfully, I don't have to wear heels, but for the other males out there: Walking and standing in these things hurts and thus most if not all people who stand and walk for a living have flat shoes. We happened upon preparations for a military parade, complete with cordon, viewing podests, at least half a dozen TV cameras etc, but were not sure if it would start soon enough for us to catch our train.We asked someone who told us it would start at 2100 local, at 1945 local it seemed about to start, and sure enough at 1955 sharp, the whole thing went under way. About a dozen groups of 50-100 people each, all in their own, respective uniforms stood against one side of a cordoned-off street and several higher-ups on the other side. Two highest-ups shouted into microphones and the throng of people on the other side shouted back answers. Then, the two highest-ups stood in the back of a jeep each and drove past said throng, stopping in front of each group, shouting into microphones mounted in the back of the jeeps and the groups shouted back once again. After that, all groups marched around the make-shift plaza once, saluting the higher ups. Once they were done, and they took ages, two trucks drove by with soldiers jumping out of the moving trucks and moving into crouching positions. They ran around in a circle a few times and engaged in pretend hand-to-hand combat. I am sure they are skilled at whatever style they wanted to show, but they were overdoing things so badly, they were funny, not imposing. When they jumped over some barriers, the barriers fell to pieces and everyone scrambled to make it look as if that was part of the show. While carrying off the gear, it fell into further pieces which was even more funny. An armoured personnel carrier ended the show; several tougher looking guys jumped off of that one and their mock combat involved fully automatic fire (of blanks), several flashbangs, smoke grenades and, to top things off, the machine gun mounted on the APC moving down the opposing team with blanks. I never witnessed a "real" military parade in person but this one was somewhat disappointing. On the one hand, there was a distinct lack of ballistic missile carriers and tanks like you see in movies, documentaries and games, on the other hand, the whole thing had a make-do feeling to it. The cordoning police had designated spots to stand on, yet walked around. They were standing to attention, yet checking their cell phones. Several people in one uniformed group were wearing track suits and jeans. Another uniformed guy had a grocery bag with him; yet another one was carrying a huge water bottle. Bikers zig-zagged through the cordon and when the whole show was just about to wrap up the police finally started putting up barriers around the unmoving pedestrians, not blocking the bikers. One little girl was standing well within the cordoned area, watching with big eyes and after she did not react to the police talking to her, they just built the barriers in a curve around her. And to top it all off, some guy with a cane walked all through the parade with his personal camcorder, trying to direct the whole show while being ignored by everyone. Still, I am sure he managed to mess up some otherwise perfectly good TV scenes. Irkutsk-Russian border TL;DR 3000 kilometers of birch trees

26 April 2012

Soeren Sonnenburg: GSoC2012 Accepted Students

Shogun has received an outstanding 9 slots in this years google summer of code. Thanks to google and the hard work of our mentors ranking and discussing with the applicants - we were able to accept 8 talented students (We had to return one slot back to the pool due to not having enough mentors for all tasks. We indicated that octave gets the slot so lets hope they got it and will spend it well :). Each of the students will work on rather challenging machine learning topics. The accepted topics and the students attacking them with the help of there mentors are We are excited to have them all in the team! Happy hacking!

9 April 2012

Soeren Sonnenburg: Shogun Student Applications Statistics for Google Summer of Code 2012

A few weeks have passed since SHOGUN has been accepted for Google Summer of Code 2012. Student application deadline was Easter Friday (April 6) and shogun received 48 proposals from 38 students. Some more detailed stats can be found in the figure below. This is a drastic drop compared with last year (about 60 students submitted 70 proposals). However, this drop can easily be explained: To even apply we required a small patch, which is a big hurdle.
  1. One has to get shogun to compile (possibly only easy under debian; for cygwin/MacOSX you have to invest quite some time to even get all of its dependencies).
  2. One has to become familiar with git, github and learn how to issue a pull request.
  3. And finally understand enough of machine learning, shogun's source code to be able to fix a bug or implement some baseline machine learning method
Nevertheless, about a dozen of proposals didn't come with a patch (even though written on the instructions page that this is required) - an easy reject. In the end the quality of proposals increased a lot and we have many very strong candidates this year. Now we will have to wait to see how many slots we will receive before we can finally start the fun :-)

18 March 2012

Soeren Sonnenburg: Shogun got accepted at Google Summer of Code 2012

SHOGUN has been accepted for Google Summer of Code 2012. SHOGUN is a machine learning toolbox, which is designed for unified large-scale learning for a broad range of feature types and learning settings. It offers a considerable number of machine learning models such as support vector machines for classification and regression, hidden Markov models, multiple kernel learning, linear discriminant analysis, linear programming machines, and perceptrons. Most of the specific algorithms are able to deal with several different data classes, including dense and sparse vectors and sequences using floating point or discrete data types. We have used this toolbox in several applications from computational biology, some of them coming with no less than 10 million training examples and others with 7 billion test examples. With more than a thousand installations worldwide, SHOGUN is already widely adopted in the machine learning community and beyond. SHOGUN is implemented in C++ and interfaces to all important languages like MATLAB, R, Octave, Python, Lua, Java, C#, Ruby and has a stand-alone command line interface. The source code is freely available under the GNU General Public License, Version 3 at http://www.shogun-toolbox.org. During Summer of Code 2012 we are looking to extend the library in three different ways:
  1. Improving accessibility to shogun by developing improving i/o support (more file formats) and mloss.org/mldata.org integration.
  2. Framework improvements (frameworks for regression, multiclass, structured output problems, quadratic progamming solvers).
  3. Integration of existing and new machine algorithms.
Check out our ideas list and if you are a talented student, consider applying!

24 October 2011

Soeren Sonnenburg: Google Summer of Code Mentors Summit 2011

Google mentors summit 2011 is over. It is hard to believe but we had way more discussions about open science, or open data, open source, and open access than on all machine learning open source software workshops combined. Well, and the number of participants oft the science sessions was unexpectedly high too. One of the interesting suggestions was the science code manifesto suggesting among other things that all code written specifically for a paper must be available to the reviewers and readers of the paper. I think that this can nicely be extended to data too and really should receive wide support! Besides such high goals the summit was a good occasion to get in touch with core developers of git, opencv, llvm/clang, orange, octave, python... and discuss about concrete collaborations, issues and bugs that the shogun toolbox is triggering. So yes I already fixed some and more to come. Thanks google for sponsoring this - it's been a very nice event.

1 October 2011

Sergio Talens-Oliag: Static website generators

The last month I was supposed to work on a OpenStack related project, but for administrative reasons it has been delayed and I've tried to do small tasks to be able to finish them quickly and start the work on the main project when the issues get solved. As the delay has been longer than expected last Wednesday I've realized than on the last weeks I did a lot of small system administration tasks: With all the changes I did I noticed that I had to do something with our Intranet server; it is just a reverse proxy for a lot of different web services and its main page was one static HTML page with links to them, nothing else. In the long term maybe we will replace it with something based on Drupal or Lifeay, but for now I just wanted something to be able to organize the links and provide some information about the services for the new users without having to write HTML (I really like Agile Documentation Tools that let me focus on the content and forget about the markup), and started to look at some of them. My first idea was to use ikiwiki, as it has all the features I was looking for: I can use Markdown or reStructuredText to write the contents, the source pages are easily handled on a Version Control System, it supports the use of templates for the HTML, etc., but it seemed to me that using ikiwiki was like killing flies with a cannon (that's a Spanish say, I guess it's easy to understand it in English, no?) and I decided to review other tools to build static web sites. To make a long story short, I selected some tools that met my requirements and looked nice on their demo sites; after my first review I thought that Hyde was going to be my bet, as it uses technologies I'm already familiar with, but after trying it I saw that I was going to have a problem with documentation (the current Hyde version lacks it) and it was going to be more complicated that using ikiwiki. Before giving up I decided to review simpler tools, just in case, and after looking some of them I ended up using poole, a simple python script (the source is just one file and it only requires python-markdown to work). Before moving to the content I tried to adapt a couple of free themes to be used by the tool, but I didn't liked the result, so I went back to the plain style provided by the tool and added a logo and a background. With that simple look and feel I started to work with the content, splitting it into eight markdown files and a python macro to include a file that has all the links used on the site. While trying to make the main page look good I noticed how little I know about CSS, but using search engines I was able to build a two column block into the main page and publish the contents and with the help of some CSS enabled co-workers I changed the look and feel of the site in about 30 minutes. In summary, if you want a really simple website, you know a little bit of python and don't want to spend much time learning how to use a website generator then Poole is a good option. If you want something more complex I still think that ikiwiki is a good option, but YMMV.

7 September 2011

Soeren Sonnenburg: Shogun at Google Summer of Code 2011

Google Summer of Code 2011 gave a big boost to the development of the shogun machine learning toolbox. In case you have never heard of shogun or machine learning: Machine Learning involves algorithms that do intelligent'' and even automatic data processing and is nowadays used everywhere to e.g. do face detection in your camera, compress the speech in you mobile phone, powers the recommendations in your favourite online shop, predicts solulabily of molecules in water, the location of genes in humans, to name just a few examples. Interested? Then you should give it a try. Some very simple examples stemming from a sub-branch of machine learning called supervised learning illustrate how objects represented by two-dimensional vectors can be classified in good or bad by learning a so called support vector machine. I would suggest to install the python_modular interface of shogun and to run the example interactive_svm_demo.py also included in the source tarball. Two images illustrating the training of a support vector machine follow (click to enlarge): svm svm interactive Now back to Google Summer of Code: Google sponsored 5 talented students who were working hard on various subjects. As a result we now have a new core developer and various new features implemented in shogun: Interfaces to new languages like java, c#, ruby, lua written by Baozeng; A model selection framework written by Heiko Strathman, many dimension reduction techniques written by Sergey Lisitsyn, Gaussian Mixture Model estimation written by Alesis Novik and a full-fledged online learning framework developed by Shashwat Lal Das. All of this work has already been integrated in the newly released shogun 1.0.0. In case you want to know more about the students projects continue reading below, but before going into more detail I would like to summarize my experience with GSoC 2011. My Experience with Google Summer of Code We were a first time organization, i.e. taking part for the first time in GSoC. Having received many many student applications we were very happy to hear that we at least got 5 very talented students accepted but still had to reject about 60 students (only 7% acceptance rate!). Doing this was an extremely tough decision for us. Each of us ended up in scoring students even then we had many ties. So in the end we raised the bar by requiring contributions even before the actual GSoC started. This way we already got many improvements like more complete i/o functions, nicely polished ROC and other evaluation routines, new machine learning algorithms like gaussian naive bayes and averaged perceptron and many bugfixes. The quality of the contributions and independence of the student aided us coming up with the selection of the final five. I personally played the role of the administrator and (co-)mentor and scheduled regular (usually) monthly irc meetings with mentors and students. For other org admins or mentors wanting into GSoC here come my lessons learned: Now please read on to learn about the newly implemented features: Dimension Reduction Techniques Sergey Lisitsyn (Mentor: Christian Widmer) Dimensionality reduction is the process of finding a low-dimensional representation of a high-dimensional one while maintaining the core essence of the data. For one of the most important practical issues of applied machine learning, it is widely used for preprocessing real data. With a strong focus on memory requirements and speed, Sergey implemented the following dimension reduction techniques: See below for the some nice illustrations of dimension reduction/embedding techniques (click to enlarge). isomap swissrollrno kllelocal tangent space alignment Cross-Validation Framework Heiko Strathmann (Mentor: Soeren Sonnenburg) Nearly every learning machine has parameters which have to be determined manually. Before Heiko started his project one had to manually implement cross-validation using (nested) for-loops. In his highly involved project Heiko extend shogun's core to register parameters and ultimately made cross-validation possible. He implemented different model selection schemes (train,validation,test split, n-fold cross-validation, stratified cross-validation, etc and did create some examples for illustration. Note that various performance measures are available to measure how good'' a model is. The figure below shows the area under the receiver operator characteristic curve as an example. foo Interfaces to the Java, C#, Lua and Ruby Programming Languages Baozeng (Mentor: Mikio Braun and Soeren Sonnenburg) Boazeng implemented swig-typemaps that enable transfer of objects native to the language one wants to interface to. In his project, he added support for Java, Ruby, C# and Lua. His knowlegde about swig helped us to drastically simplify shogun's typemaps for existing languages like octave and python resolving other corner-case type issues. The addition of these typemaps brings a high-performance and versatile machine learning toolbox to these languages. It should be noted that shogun objects trained in e.g. python can be serialized to disk and then loaded from any other language like say lua or java. We hope this helps users working in multiple-language environments. Note that the syntax is very similar across all languages used, compare for yourself - various examples for all languages ( python, octave, java, lua, ruby, and csharp) are available. Largescale Learning Framework and Integration of Vowpal Wabbit Shashwat Lal Das (Mentor: John Langford and Soeren Sonnenburg) Shashwat introduced support for 'streaming' features into shogun. That is instead of shogun's traditional way of requiring all data to be in memory, features can now be streamed from e.g. disk, enabling the use of massively big data sets. He implemented support for dense and sparse vector based input streams as well as strings and converted existing online learning methods to use this framework. He was particularly careful and even made it possible to emulate streaming from in-memory features. He finally integrated (parts of) vowpal wabbit, which is a very fast large scale online learning algorithm based on SGD. Expectation Maximization Algorithms for Gaussian Mixture Models Alesis Novik (Mentor: Vojtech Franc) The Expectation-Maximization algorithm is well known in the machine learning community. The goal of this project was the robust implementation of the Expectation-Maximization algorithm for Gaussian Mixture Models. Several computational tricks have been applied to address numerical and stability issues, like An illustrative example of estimating a one and two-dimensional Gaussian follows below. 1D GMM2D GMM Final Remarks All in all, this year s GSoC has given the SHOGUN project a great push forward and we hope that this will translate into an increased user base and numerous external contributions. Also, we hope that by providing bindings for many languages, we can provide a neutral ground for Machine Learning implementations and that way bring together communities centered around different programming languages. All that s left to say is that given the great experiences from this year, we d be more than happy to participate in GSoC2012.

3 September 2011

Soeren Sonnenburg: Hello World

I have finally managed to setup a blog using handcrafted html and django. Having had the experience to write mloss.org, largescale.ml.tu-berlin.de and mldata.org my homepage here was done rather quickly (one weekend of hacking). If anyone is interested in website design I can really only recommend to use django. Work your way through the awesome tutorial or the book and get inspired by the sources of the many django based websites out there. Finally keep an eye on django code snippets that often contain small but useful functions and are solutions to problems you might have.

29 July 2011

Jaldhar Vyas: hello_minix-i386.deb

NEW YORK, Jul 29, 2011 PreventaConf The crowd of thousands gathered here to attend PreventaConf 2011 were abuzz over the unveiling of the culmination of 3 years of aimless arsing about^ww^w concentrated development which has led to the porting of dpkg to the MINIX 3 operating system. dselect on Minix Women fainted, men wept openly, medical personnel were kept busy dealing with the many instances of heart seizure, multisensory hallucination and spontaneous combustion which accompanied this momentous event. Much remains to be done. Still this is a major milestone of which the Debian Minix developers are justifiably proud.

18 April 2011

Gerfried Fuchs: Wise Guys

My brother did invite me to the concert of the Wise Guys, a German acapella group. They are one of those special groups who are able to give a cheering live show and have this special cheek-in-tongue humour in a fair amount of their songs. This is the selection that helps me keeping my mood up though, you are invited to dig further. Hope you are able to appreciate them as much as I am. At least they are able to cheer me up a fair bit.

/music permanent link Comments: 2 Flattr this

7 February 2011

Pau Garcia i Quiles: Luggage and RyanAir

FOSDEM ended yesterday and here I am sitting at Charleroi Airport (also known as Brussels South , quite a misleading name given that it s 80 Km from Brussels). I have already passed all controls, check-in and everything. While I wait for boarding, I am watching the shameful spectacle of airport personnel (let me reiterate that: airport personnel, not Ryanair s) enforcing RyanAir s 10 Kg cabin baggage limit. According to RyanAir, they want to minimize the weight the plane carries to use less fuel. So far, so good. Here is what I have seen: people who do not carry any baggage (very few, they have probably checked it in because it exceeded size or weight), people who are below the 10 Kg limit and people who are way over it (and have been told to check luggage in). I am OK with those cases. There is still a fourth case: people who are slightly over 10 Kg. I ve seen a woman whose bag was 10.15 Kg to be told to pay 20 EUR to check her bag, or go back to the RyanAir desk to check-in the bag. She opened her bag, took a scarf, put it on and now the bag matched the weight limit. Yes, RyanAir is charging 20 EUR/Kg for hand baggage from 10 Kg on. What a rip-off. A couple of East-European girls were about 1 Kg in excess each. They put a couple of extra jumpers on and now baggage was under 10 Kg. Many people were about 1 Kg in excess. When they were told their suitcase better got lighter or pay 20 to 40 EUR. Most of them just took something (camera, food, slippers, whatever) and put in the pockets of their coats. Fortunately, RyanAir is not charging for body and clothes weight (yet?). In all those cases the plane will end up transporting the same weight and RyanAir won t get one more dime, so why RyanAir? Why are you such a shameful company? Why are you enforcing ludicrous and pointless policies? Don t you know after passing the control everybody just put everything back into the suitcase? Of course you do. So after watching this ridiculous spectacle go on for a while, I had a devious idea: let s organize a fat people conference and fly them all over to and from using RyanAir. Further, all of them should carry exactly 10 Kg hand baggage.

9 May 2010

Gunnar Wolf: You will not be renumbered

I woke up with a loud BZZZT It happens every couple of years. The electric transformer for the circuit where my house is located, at the Northern edge of Ciudad Universitaria, decided to die (or at least, to take a break literally). About one hour later, I decided it was time to wake up and start being a useful person. I gave breakfast to my cats and had breakfast myself, and called the electrical company to report this mishap. They told me the report was registered, and I hope to have electricity soon (meanwhile, I'm sitting at a nearby restaurant, as there is some job to do Yes, besides writing blog entries). And they told me, as is often told in Mexico, a general anticorruption phrase With a twist: recuerde que usted no debe renumerar ning n servicio que realice el personal de la Comisi n Federal de Electricidad . (You should not renumber any service done by the Federal Electrical Commission personnel). Yes, remunerado (not renumerado). This shows once again the power of asking people to read things they don't understand.

15 February 2010

William Pitcock: Vierra guys Stump in 2010

vierra_285 Vierra band to be different in 2010. The personnel men, Kevin cs, want to look more muscular. Of course there are reasons why the personnel Ballerina boys want to look more muscular. Raka Cyril, the guitarist, spoke up to express that excuse. . To create a more muscular body would need a business. So, Kevin (keyboards), Raka (guitar) and Tryan (drums), do not forget the additional bass player, Deryansha Azhary, melakoni fitness. Fitness together can maintain cohesion, said Raka again. The desire for fitness with the new fact emerged two weeks ago. At that time only as an expense only. But after thinking together, three boys Vierra agreed to be different in 2010. Apart from personnel appearance that her boyfriend would be more muscular, there is no change from the Ballerina in 2010. Band with hits Are You Really it was not planning to release a new album. We do not want to hurry. s Album coming out yesterday, not to waste too, added Widi, the vocalist.

17 November 2009

Florian Maier: another debian bug squasing party at limux office

To celebrate the move to our new office building, there will be another Debian Bug Sqashing Party! The event will take place from 27.11. to 29.11. Here is the Google Maps link. The BSP starts on Friday afternoon at about 17:00. As always, everybody with moderate Unix skills is welcome to participate. Some Debian or programming knowledge is also desirable. For Debian Newbies, there will be a packaging introduction on Friday evening. Please be prepare your Notebooks for a Wireles LAN setup, as we're still not done with the cabling after our move to the new office building last week. Last, but not least i'm glad we will be able to sponsor some pizza for the hungry developer crowd ;-) More information about BSPs is available here:

16 November 2009

Erich Schubert: DebConf 2011 in Munich

We'd like to host DebConf 2011 in Munich, Germany.However, this is a far from trivial challenge:Rent in Munich, in particular for conference rooms, is far from cheap. In my opinion, unless we get some really big sponsor (and I'd still prefer spending sponsor money to fund developer trips to the DebConf instead!), the only chance we have is to get some rooms at the university.However given the development of the recent years (budget etc.), it has become a lot more difficult to actually get rooms at the university for such events. Unless the event is considered to be fully a part of the universitys "work", we might have to pay rent to the university. Which again isn't that affordable.Anyway, if you are in Munich, working at one of the universities, or in any way interested in supporting DebConf 2011 in Munich, please join the DebConf11 Germany mailing list. Also check our meetings scheduled on the DebianMuc Wiki page, currently every Monday, 18:00, at the new LiMux offices in Sonnenstr.P.S. There will also be a Bug Squashing Party in Munich end of November: Munich BSP November 2009

10 November 2009

Yves-Alexis Perez: Call for help: update

Ok so there were some reactions to the Call for help post. I had three direct offers for help in pkg-xfce, not sure if other teams had such propositions. Some people asked me to correct various number for the active contributors . Basically, the numbers are what the feeling I got from people working in those team. Julien Cristau wants me to correct the number of debian-x active contributors to 0. (yes, zero, that means nobody, nadie, personne). Basically he doesn't have time anymore, and Brice Goglin can't really keep up. So, for those who care about shiny X effects, and stuff like that, you help would be gladly appreciated (and no, you don't have to own each and every chipset in the world to give some time). Aurelien Jarno wants me to add that at the moment there are 2 (two) active libc contributors, plus one on GNU/Hurd and one on kfreebsd. Frans Pop wants me to add that there are ~85 people working on d-i and that the problems the team might face aren't only related to the lack of manpower (and I don't really want to enter politics) Finally, it seems that some people (well, only one at the moment, but it's enough for to feel the need to precise) though the numbers previously given would dismiss contributions for the active contributors. That wasn't my intention, so I apologize if you are an active contributor in one of that team and thought I dismissed your contribution. If it wasn't clear enough, my point is to show that quite some teams are lacking manpower (some team miss other things too, like leadership, coordination or whatever) and users shouldn't be scared to contribute to them. Those are core teams, without them Debian wouldn't work at all (not to mention derivatives), so it's a good idea to join them. Now, what if you do want to help, but don't know how. On the previous post I gave links to teams website, wiki page or QA page. You should be able to find a mailing list or contact mail you should be able to write to. Just write that you want to offer some help, that you don't know how and where to start. Add what you're interested in, what you find fun, and your technical knowledge. Don't be shy, and you don't need to be a Debian Developer (nor even a Debian Maintainer) to contribute. Thanks!

Michael Banck: 10 Nov 2009

Bug-Squashing-Party in Munich We are organizing a BSP in Munich on the last weekend of November (28th/29th). It will take place in the (new, they are moving to the neighboring building this week) LiMux office on Sonnenstr. 25, between U-Bahn stations "Stachus" and "Sendlinger Tor". If you are from outside Munich and want to attend the BSP, please let me know (mbanck@debian.org) so we can maybe arrange something like limited travel sponsorship or lodging (some of us can offer crash space at least). We specially invite people from within 150 km, like Nuremberg/Erlangen, Salzburg, Ulm, Augsburg and Innsbruck. We probably start the BSP at some point on Friday evening already, but the main action will be on Saturday and Sunday. As usual, people should bring their notebooks and possibly an ethernet cable. Wireless will be present as well, but a certain bandwidth cannot be guaranteed.

25 September 2009

Gunnar Wolf: Honduras: .hn NIC attacked/intervened by the de-facto government authorities

I was requested to forward this information to as wide an audience as possible. Possibly two months ago the legality/legitimacy of the actions carried out by the Hondurean armed forces, which captured a democratically elected president and without a judicial order or trial process forced him out of the country, starting a de-facto government, was something questionable. Each day, however, it becomes clearer and clearer the Hondureans are suffering a represive military-backed system which cannot be expected to fulfill as a trustable entity to conduct fair, credible elections. I got this message from a Hondurean friend (of course, whose identity I am not divulging) denouncing the government's invasion of the .hn domain name registry, which is handled by the Sustainable Development Network (Red de Desarrollo Sustentable RDS-HN). The National Telecomunications Comission (Comisi n Nacional de Telecomunicaciones, CONATEL) demands all domain name registration under the .hn top-level domain (TLD) to be suspende, and all the lists and databases regarding said TLDs to be handed over, detailing the IP ranges and the responsibles. They did this under the argument that RDS-HN is an Internet Service Provider (which it is not Being a registrar means they are responsible for the well-keeping of public information and of handling a public good, the .hn TLD, not that they provide any kind of regulated service to individuals or organizations), with military personnel disguised as civilians (and who refused to identify themselves). If you are interested, please read further on the text I received straight from my Hondurean contacts (Spanish) (or its unaccurate but often helpful automated translation to English, done through Google Translate) Even though this information is normally accessible via WHOIS and similar services (this only states clearly nobody in CONATEL was able to do what I just did legally and anonymously from my personal workstation), they did it in such a fashion in order to scare the operators and the society. Honduras is going through a very hard process. Whatever happens there will likely impact on the future reactions to the most retrograd and powerful sectors of society in the rest of Latin America. We do our best (even if as non-Hondureans living outside Honduras it only means raising our voices) to avoid the risk of our region going back to the sad, cruel and bloody 1970s history. [update] My friend Mave, who works at NIC Chile, sent as a comment to this post LACTLD's official stand on this regard (Spanish. English version also available). LACTLD (Latin American and the Caribbean ccTLD's Organization) clearly backs RDS-HN and condemns the illegal government's actions.
AttachmentSize
Detailing the intervention attempt of RDS-HN by the de-facto government agents (Spanish)8.61 KB

Next.

Previous.